We offer
Human-Sourced, AI-Enhanced, Scientist-Reviewed,
Large-Scale Pre-Labeled Speech Datasets
-
Wanna get 100 hours of FREE samples?‡ We have B 1 G 10 too — Buy 1 hour conversation data, and get 10 hour non-conversation data for FREE!
High-Quality
— unlike free or studio-recorded datasets, we offer extra:
- Transcript Validation — word-level confidence scores (no hallucinations)
- Transcript Correction — proprietary methods to fix errors in human-sourced transcripts, especially named entities (e.g.: names, orgs, locations, times ...)
- Timing Information — word/phone-level timestamps and speaker turns
- 360° Annotation — speaker names and turns, SNRs, topics, descriptions ...
- Label Customization — choose from pre-labels or request new labels
- Lifetime Curation — continuous label refinement and update at no extra cost
Category | Olewave | Legacy | Free |
---|---|---|---|
Configurable Labels | ★★★★★ | N/A | N/A |
Data Quantity | 1k - 10M hrs | <10k hrs | <100k hrs |
Label Quality | ★★★★★ | ★★★★☆ | ★★☆☆☆ |
Data Coverage | ★★★★☆ | ★★★☆☆ | ★★☆☆☆ |
Data Naturalness | ★★★★☆ | ★★★☆☆ | ★★☆☆☆ |
Cost-Effectiveness | ★★★★★ | ★★★★☆ | ★★★☆☆ |
‡: US-based companies and institutes only. NDA signing required.